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ABSTRACT 

The validity of public examinations as measures of academic 
achievement is not perfect, and the generalizability of paper-and-pencil tests to 
real-life tasks is rather low. In the United Kingdom, small differences between levels 
of attainment on public examinations cannot be attributed to real differences in 
achievement. The comparability of achievement tests is reduced by changes over time, 
place, examination board, mode of examining, subject, and syllabus. A major thrust of 
this paper is to suggest that a consideration of standards or effectiveness is not a 
simple matter of counting and comparing. In fact, there is no real evidence of failing 
educational standards over time in Britain and no convincing evidence of 
underperformance relative to the educational systems of other developed nations. 
International comparisons and those based on local education agencies do suggest that 
comprehensive systems of schools based on parental choice tend to produce narrower 
social differences in intake and outcomes. Systems with more differentiation have 
greater gaps in attainment between social groups. The United Kingdom is in a 
reasonable comparative position. There are problems related to education, certainly, 
but the current examination system was designed to differentiate between candidates. 
This differentiation cannot be used, logically, as evidence of underattainment. 
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Summary 



ERIC 



‘Achievement’ at school generally describes levels of attainment in public examinations such 



The validity of public examinations as measures of achievement is not perfect. The 
general isability of pencil and paper tests to real-life tasks is rather low. 

Public examinations are not wholly reliable. Therefore, small differences between levels of 
attainment cannot be attributed to real differences in achievement. 

Fair and rigorous comparisons cannot be made between different forms of attainment. 
Comparability is reduced by changes over time, place, exam board, mode of examining, 
subject and syllabus. 

Differences in attainment cannot be calculated by simple subtraction. They must be 
proportionate, contextualised, and hedged around with doubts about the underlying 
distribution of the scores. 

‘Underachievement’ is used to describe a range of phenomena. These range from the 
differential attainment of groups of school students (such as those formed by nation, region, 
ethnicity, language, school type, sex and social class) to the failure of an individual student 
to attain a level equivalent to the best prediction of their future performance (value-added or 
contextualised). 

There are problems of unreliability and invalidity in the categories frequently used to define 
groups of underachievers (such as social class and ethnicity). As the unreliability of 
attainment measures and classifying variables increases so does the chance of spurious 



Once operationalised, there is no convincing evidence for any of these forms of 
underachievement. 

In the UK, there is an absence of appropriate experiments to assess the reasons why some 
groups do less well in compulsory schooling. Only experimental designs can test causal 
models leading to fruitful ameliorative action. Filling this gap was the main purpose of the 
thirty million pounds spent on the ESRC-controlled Teaching and Learning Research 
Programme. This purpose is unlikely to be met. 

Given this lacuna, we are left with post hoc analyses of large datasets seeking cause by 
statistical manipulation, and small-scde studies of ‘qualitative’ data often not seeking causes 
at all. Both approaches have significant defects. This paper focuses on the former approach, 
but the problems generally encountered in the latter approach are even greater in terms of 
rigour, generalisability and comprehensibility. 
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• There is no reason to assume that achievement in the UK is worse than in comparable 
nations. Nor is there any evidence for the much-cited notion that results in the UK are more 
polarised. 

• There is no reason to assume that achievement in different parts of the UK, or in different 
types of schools, are different for equivalent students. 

• There is no reason to assume that achievement differs between social groups, as defined by 
ethnicity, social class, language or sex (for otherwise equivalent students). 

• The differences in raw-score attainment in the above groups disappear in either a value- 
added or a contextualised analysis. 

• There is some evidence that achievement in state-funded schools is improving over time, and 
that, contrary to popular reports, the gaps in attainment between identifiable groups are 
declining. 

• Much public money is being spent on research that cannot produce the answers required of it, 
and on policies to ameliorate growing gaps in attainment that do not exist. 

There is insufficient space here to argue each of the above closely with full supporting evidence. 

Instead the outline below uses references to published peer-reviewed material available upon 

request to supplement the examples of research given. 



1. Examinations and comparability 

1.1 There have long been complaints that standards of attainment in UK education have fallen 
over time (Cresswell and Gubb 1990, National Commission on Education 1993, Barber 1996), 
that they are poor in comparison to similar countries (Boyson 1975, Prais 1990, Skills and 
Enterprise Network 1999), and that standards are particularly poor for the lowest achievers 
(Postlethwaite 1985, Bentley 1998, DfES 2001). Therefore, the UK is supposed to have a 
uniquely polarised assessment system, with excellent results for some and a long tail of 
underachievers. Claims such as these are quite common, and contribute to what has become a 
‘crisis account’ of the state of the UK education system and its schools (Gorard 2000a). 

1.2 However, judging standards is difficult without having a close definition of the term 
'standard'. As an illustration of how elastic the term can be, consider the very real situation in 
which an educational attainment indicator such as a GCSE becomes more common over a period 
of ten years. One group of comentators may claim that standards have therefore improved, 
because more students now attain the GCSE standard. Their opponents may claim that standards 
have fallen, since the GCSE is now demonstrably easier to obtain and also worth less in 
exchange. The point to be made here is that knowledge is not a static commodity, and 
comparisons of changes over time in school attainment have to try and take these changes into 
account. One analogy for the complaint by the National Commission on Education (1993) that 
number skills have deteriorated for 11-15 year olds, would be the clear drop over the last 
millennium in archery standards among the general population. If the number of children 
knowing the meaning of this word 'mannequin' drops from 1950s to the 1970s is this evidence of 
some kind of decline in schooling? Perhaps it is simply evidence that words and number skills 
have changed in their everyday relevance. On the other hand, if the items in any test are changed 
to reflect these changes in society, then how do we know that the test is of the same level of 
difficulty as its predecessor? In public examinations, by and large, we have until now relied on 
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norm-referencing. That is, two tests are declared equivalent in difficulty if the same proportion 
of matched candidates obtain each graded result on both tests. The assumption is made that 
actual standards of each annual cohort are equivalent, and it is these that are used to benchmark 
the assessment. How then can we measure changes in standards over time (for there cannot be 
any, by definition)? But, if the test is not norm-referenced how can we tell that apparent changes 
over time are not simply evidence of differentially demanding tests? This apparently insuperable 
problem has, to my mind, not been adequately addressed (Gorard 2001a). 

1.3 Britain uses different regional authorities (local examination boards) to examine what are 
meant to be national assessments at 16+ and 18+ (Noah and Eckstein 1992). It is already clear 
that even qualifications with the same name (e.g. GCSE History) are not equivalent in terms of 
subject content as each board sets its own syllabus. Nor are they equivalent in the form of 
assessment, or the weighting between components such as coursework and multiple-choice. Nor 
is there any evidence that the different subjects added together to form aggregate benchmarks are 
equivalent in difficulty to each other. In fact, comparability can be considered between boards in 
any subject, the years in a subject^board combination, the subjects in one board, and the 
alternative syllabuses in any board and subject All of these are very difficult to determine, 
especially as exams are neither accurate nor particularly reliable in what they measure (Nuttall 
1979). The system of statutory assessment is also producing a flood of complaints about 
irregularities and inconsistencies (Cassidy 1999). Pencil-and-paper tests have little generalisable 
validity, and their link to other measures such as occupational competence is generally very 
small (Nuttall 1987). 

1.4 The problems faced by researchers in international studies of student performance are even 
greater. These include the comparability of different assessments, the comparability of the same 
assessments over time, using examinations or tests as indicators of performance at all, the 
different curricula in different countries, the different standards of record-keeping in different 
countries, and the competitiveness (especially) of developing countries (see O'Malley 1998). Yet 
what international comparisons seek to do is solve not one but all, and more of these problems at 
once (Gorard 2000b). 

1.5 A further problem is that simple differences between attainment scores are being routinely 
misrepresented by academics, policy-makers and the media, in a way that takes no account of 
their underlying distribution or their base rate (Gorard 1999a, Gorard and Taylor 2002b). 

1.6 In summary, it is extremely difficult to claim that small differences in ‘surface’ attainment 
between students represent real differences in achievement. 



2. Underachievement 

2.1 ‘Underachievement’ is now a widely used term in education policy and practice (Gorard 
2000c). It is used routinely to refer to nations, home nations and regions, to types and sectors of 
schooling, to physiological, ethnic and social groups, and to individuals. It has been used to 
mean simply low achievement, also lower achievement relative to another of these groups, and 
lower achievement than would be expected by an observer. These multiple uses lead to 
considerable confusion which, coupled with common errors in assessing the proportionate 
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difference between groups, mean that significant public money has been spent attempting to 
overcome problems that may not, indeed, exist (Gorard et al. 2001). Where underachievement is 
understood to mean a lower level of achievement by an individual (or group) than would be 
expected using a model based on the best available predictors, then the underachieving 
individuals have nothing in common (else that common factor would become part of the best 
prediction). If, instead, we reserve some predictors from our best model (sex or poverty, for 
example), we still find no evidence that underachievers have much in common (Smith 2002). In 
raw-score terms, we might say that a particular social group exhibits lower achievement (in the 
sense of publicly available figures relating to pencil and paper tests) than another, as in the case 
of some ethnic groups. Or we might say that there is differential attainment between groups, as 
in the case of males and females. This is very far from saying that the lower-attaining group 
could and should do better on that assessment. The term underachievement has conceptual and 
practical difficulties, which chiefly lie in determining what the 'under' is in relation to. When it is 
used in relation to peers, or prior attainment, or cognitive aptitude tests for example there is no 
clear way of separating it from errors in the baseline testing system. To assume, as the DfES and 
many researchers in this field appear to, that the assessment system is neutral (by sex, for 
example) and that any differentid is related to achievement or performance seems peculiarly 
naive. This is especially so in the light of the already acknowledged general unreliability of 
statutory assessments. Making explicit what we mean by underachievement is an important step 
towards accepting that, collectively, we may not really mean anything by it. 

2.2 The nature of formal assessments means that comparing standards over time (or between 
groups) is very difficult. If the same test is administered repeatedly year-on-year, so that we can 
assume the same level of difficulty over time, then there are potential practice effects. Any 
increase in test results could be due to familiarity with the test. On the other hand, where the test 
is changed every year to keep it up-to-date and prevent practice effects, then we have no way of 
knowing whether successive tests are of the same standard. Until 1987 this problem was largely 
overcome in public examinations by ‘norm-referencing’. An assumption was made that the test 
cohort every year was of the same ability, but that the test varied. So, instead of having a pass 
mark the test had a set pass proportion. For example, in 0-level English perhaps 10% were given 
the top grade every year. So, by definition, it was impossible to ask whether standards were 
rising year-on-year. The underlying assumption of exam marking was that standards did not 
change. The only change allowable was in the proportion of the age cohort entering any 
examination. Since 1987 the UK has moved to a system based largely on criterion referencing. 
Now, each grade is related to a description of what is required, and if the candidate gives 
evidence of this then the grade is awarded. Since 1987, therefore, standards have been allowed to 
vary. This has led to an annual increase in exam scores, but has also made it impossible to tell 
whether this is due to rising standards of candidates or a lowering of the standards of tests. In the 
absence of a valid independent benchmark, any discussion of relative educational standards in 
the UK is somewhat pointless. 

National achievement 

2.3 Similar problems arise when trying to compare results between countries. Here, the problems 
of different entry rates and different standardisation procedures are compounded by the different 
assessment systems, and even by differences in the educational systems (and, of course, the 
curricula) themselves. Where the same test is administered in each country (as in the Third 
International Mathematics and Science Study), re-consideration of the results shows that there is 



no convincing evidence of ‘underachievement’ in the UK. UK scores are compared with 
countries like: the US which has much fuller coverage of the curriculum underlying the test; 
Singapore where children do not advance through school years automatically (meaning that they 
were, on average, 6 months older than UK students in TIMSS); and even Thailand whose scores 
are based only on the 32% of the age cohort attending school. Where a different test is used for 
each country (perhaps more appropriate to the local curriculum), problems of comparability 
arise. How can we tell whether the baccalaureate in France or the arbitur in Germany are 
equivalent in difficulty to the GCSE in the UK? 

2.4 Anyway, sixteenth place for England in TIMMS (Mathematics) is far from impressive, but 
better than several countries including USA, Norway and Spain. Many of the other countries 
taking part also scored lower, but were omitted by the researchers from analysis as they did not 
meet the sampling requirements for the study. In this study of the attainment of 14 year-olds, one 
South American country submitted scores for a cohort averaging 16 years of age. Otherwise, the 
oldest average age is for Singapore at the top of the table in terms of score, and the youngest is 
for Iceland near the bottom. The linear correlation between age and score means that one would 
expect countries with older children in the test to have higher scores, and that nearly 30% of the 
variance in outcomes is explicable by differences in mean age alone. There are further problems 
with the study in terms of sampling, low response rate (below 50% for England, Keys et al. 
1996), inclusion or exclusion of students with special educational needs, overlap of standard 
errors, and motivation. Brown (1998) concludes that the information in international league 
tables is generally too flawed to be of any use at all. 

School achievement 

2.5 At the level of comparison between schools (department or teachers), school effectiveness 
work has attempted to describe the characteristics of a successful school in a way that could form 
the basis of a blueprint for school improvement. Ironically, the major undisputed outcome of all 
of this work has been the reinforcement of the importance of non-school context (Coleman et al. 
1966, Gray and Wilcox 1995). National systems, school sectors, schools, departments and 
teachers combined have been found to explain approximately zero to 20% of the total variance in 
school outcomes. In all studies this ‘effect’ is smdl, and the larger the sample used, the weaker is 
the evidence of any effect at all (Shipman 1997) - and, of course, we could not be certain that it 
is an ‘effect’ since the underlying causal model remains opaque. The remainder of the variance 
in outcomes is explained by student background, prior attainment and error components. Despite 
this, most educational policies are based upon comparisons between schools that do not take 
these incontrovertible findings into account. Such policies include league tables of results, 
programmes of inspection, and national and regional targets, all of which have presented 
attainments in raw-score forms. When researchers have attempted to relate this small school- 
effect to school characteristics and processes, so producing a blueprint for school improvement, 
the results have generally been negligible. The factors making up a 'good' school are frequently 
nebulous (Ouston 1998) or tautological (Hamilton 1997). 

2.6 Where claims have been made regarding the superiority of schools in one or more home 
countries of the UK, the situation is somewhat easier to assess as the systems themselves are 
more similar. While both countries have very similar school systems, Wales, for example, has 
until recently produced lower exam scores at all levels than England. However, once levels of 
poverty have been taken into account, schools in Wales have produced results that are as least as 
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good as those in England (Gorard 1998a). Similar points can be made about differences between 
types of schools within one home country (Gorard 1998b). To expect a school with many 
students in poverty to gain the same kind of exam success as a school with nearly no poor 
students at dl, is ridiculous. Yet this is what raw-score comparisons (such as league tables) do. 
Once levels of poverty, and other background factors, are taken into account in regression 
equations then there is no evidence that any type of school performs any better than any other. 
State-funded schools in the UK are also rapidly catching up with the exam scores of fee-paying 
schools (Gorard and Taylor 2002b). So the question is not about the underachievement of 
schools or regions. Rather it is why there is this link between poverty and attainment. 

2.7 Once their context is taken into account, there appear to be better and worse performing 
schools of all types and in all sectors. However, the overwhelming majority of variance in school 
results is predicted by the nature (or prior attainment) of the intake. Little variance is left to be 
labelled a 'school effect', and even this contains an error component of unknown size. Put 
another way, there is no clear evidence of schools having much systematic effect at all on the 
attainment of their students. It appears that each individual would achieve pretty much as they do 
in any school, and that school 'improvement' consists largely of admitting more high achieving 
students - whether through direct selection as in some specialist and all grammar schools, or 
indirectly via the admissions systems, as in faith-based and Foundation schools. 

2.8 Operationalising the concept of underachievement is key to appreciating which group of 
students succeeds at school and also in understanding the confusion between low achievement 
and underachievement. A recent study used detailed student-level data to measure and identify 
underachievement among a group of over 2000 year 9 secondary school students (Smith 2002). 
Over 30 variables which the academic literature cite as being linked to academic performance 
(such as prior attainment, attitude towards school and receipt of free school meals) were used to 
predict the future examination performance of these students; any individuals who failed to fulfil 
their potential were considered to be underachieving. There was little to distinguish the 
underachieving students from their peers. While there were some working class boys who 
underachieved, for example, using this definition, there were others who overachieved. Indeed, 
students in the underachieving group came from across the ability range; therefore it was 
possible to have a high ability underachiever as well as a low ability underachiever. The best 
predictors of academic success were prior attainment and attendance at school (accounting for 
three quarters of the variation in examination outcome), with sex and social class accounting for 
a negligible amount of the variance. Students who came from more economically disadvantaged 
backgrounds, performed less well in the Key Stages 2 and 3 examinations in every subject, as 
well as being less regular attenders at school - disadvantages which far exceeded those between 
the sexes. However, these students were not disproportionately wntfe/'achieving in terms of the 
model. 

2.9 In summary, once the issues discussed in section I are taken on board it is difficult to 
conclude that levels of attainment in the UK are poor, falling, or weak in comparison to other 
countries. It is difficult to conclude that any one sector or type of school is weaker than another. 
It is not possible to identify entire groups of students with a tendency to underachieve. It is 
possible to identify groups which attain lower scores - but the category which binds them 
together (such as sex or social class) is a ‘pseudo-explanation’ for their lower achievement (see 
below). There is some evidence that standards of attainment are improving over time. 
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3. Achievement gaps 



3.1 This section examines patterns of attainment polarisation in England at a variety of levels. 
The PISA study in 2000 involved all EU countries. National segregation by examination 
outcome (for reading - the only score with complete coverage) is largely explicable by the use of 
academic (and other forms of) selection (Smith and Gorard 2002a). In all countries there are 
small gaps between the performance of boys and girls in reading, in favour of girls. This gap is 
generally smaller in countries with the highest overall scores. Overall, the Scandinavian 
countries of Sweden, Finland and Denmark show less segregation on all indicators. The UK has 
below average segregation in terms of all indicators, despite a commonly held but unfounded 
view that segregation in the UK is among the worst in the world. 

3.2 Table 1 presents the results for reading performance according to the students’ score on the 
PISA indicator of wealth (Smith and Gorard 2002b). Students who fall into the lowest 10% by 
wealth perform less well on the reading tests. In general, countries with the lowest gap in reading 
performance between richest and poorest are also those that have relatively high scores, even for 
the poorest 10%. Finland, Ireland and the Netherlands have high scores for both groups, while 
France, Germany and Luxembourg with heavily selective systems have both very low scores for 
the poorest 10% and only average scores for the richest 90%. The UK has the fourth highest 
score for the poorest 10% and the third highest score for the richest 90%. In fact, the scores in 
the UK are so far from polarised that the reading score for the lowest 10% is higher than the 
overall score for most countries. There is no evidence here of the purported crisis of 
underachievement in UK education. However, all of the foregoing caveats also apply to these 
figures. 

Table 1 - Mean reading score according to P ISA indicator of family wealth 



Counti'y 


Poorest 

10% 


Richest 

90% 


Luxembourg 


385 


452 


Portugal 


422 


483 


Germany 


454 


504 


Greece 


456 


475 


France 


465 


509 


Spain 


469 


499 


Italy 


472 


492 


Austria 


477 


502 


Denmark 


479 


502 


Belgium 


489 


519 


Sweden 


495 


519 


UK 


502 


529 


Ireland 


512 


530 


Finland 


540 


550 


Netherlands 


541 


543 
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Gaps between groups 



3.3 Policy-makers, media commentators, and academics have recently worked together to create 
a 'moral panic' about the underachievement of boys at school (see for example, DENI 1997, 
Dean 1998). Although each account may have minor variations, the dominant version is as 
follows. There was a fairly recent period when boys were out-performing, or at least out-scoring, 
girls at school. Then girls began to catch up in terms of school performance and qualifications. 
They have now overtaken the boys, and the gap between the genders is increasing over time. 
Boys are prevalent in terms of school failure, non-qualification, exclusion and special needs. 
This is a universal phenomenon unrelated to local socio-economic considerations. Boys are 
therefore underachieving (see Salisbury et al. 1999 for a fuller account of this literature). Since 
this much is apparently clear the next task is to overcome the disadvantage of boys by remedial 
action in schools. This task is being attempted by multiple action research projects (e.g. School 
of Education 1998) or by attempting to transfer strategies from schools presumed to show good 
practice because they have a lower gender gap in attainment than their peers (as in the DfES 
project on 'boys underachievement’ based in Cambridge). 

3.4 In fact, very little of this dominant account has any validity. The confusion in this field can 
be seen in the fact that as late as 1997, some respected writers in this field still believed that boys 
were outscoring girls at GCSE (e.g. David et al. 1997), but that there was 'a closing gender 
performance gap in most subjects in GCSE' (p.99) with 'girls closing the gender gap' (p.l02). 
Recent re-analyses of the national figures for attainment from Key Stage 1 to A level have 
shown that the gaps between girls and boys have remained the same since the early 1990s, 
perhaps even declining slightly over time (Gorard et al. 1999). Where achievement gaps exist 
(and of the core subjects these only consistently appear in English, and Welsh in WalesX they are 
at the highest levels of attainment, just as they are for gaps between the achievement of ethnic 
groups (Johnston and Viadero 2000). The nature and size of these gaps vary regionally, and are 
clearly related to socio-economic factors. In fact, once the complexity of factors and obstacles 
such as home background, school structure, and social skills are taken into account a simple 
gendered explanation of achievement does not work (Kutnick 2000). Nor, apparently, do the 
simplistic solutions being suggested to the problem, such as single-sex teaching (see Harker 
2000). According to the best records we have boys have not attained higher grades (at 16+) than 
girls for at least 25 years. In fact, it is not even clear that we have any reliable evidence that boys 
have ever done better than girls in compulsory schooling. 

3.5 There is currently no sizeable or consistent gender gap at the lowest level of attainment in 
any public examination for any subject for any Key Stage. Approximately the same proportions 
of boys and girls of the relevant age gain at least the lowest level of each qualification (such as 
Level 1 at Key Stage One). In addition, for Mathematics and Science (and a few other 
curriculum areas) there is no sizeable or consistent gender gap at any level of attainment. Put 
another way, the assessment system is largely gender-neutral. There are achievement gaps in 
several curriculum areas, most notably English, other languages, and humanities. Where these 
appear, they are greatest at the highest level of attainment, mostly affecting a minority of (the 
most able) children (Table 1). These gaps are not increasing over time. The gaps in some 
subjects remain relatively static, while some are declining slightly. It is also worth noting that in 
subjects where children are assessed both by teachers and by a task/test, then the task/test 
produces lower achievement gaps (i.e. it is more gender neutral). 
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Table 1 - Achievement gap in favour of girls; GCSE English 





Entry 


A* 


A 


B 


C 


D 


E 


F 


G 


1992 


20 




27 


23 


16 


10 


5 


1 


0 


1993 


20 




31 


24 


16 


10 


5 


2 


0 


1994 


30 


43 


34 


27 


18 


11 


5 


1 


0 


1995 


10 


44 


35 


24 


16 


8 


4 


1 


0 


1996 


10 


43 


36 


25 


16 


9 


4 


1 


0 


1997 


20 


43 


35 


25 


15 


9 


5 


2 


1 



[table entries represent the extent to which girls outnumber boys in each cell. 20 as an entry gap 
shows that 20% more girls sit the assessment. 16 as a gap at C grade shows that 16% more girls 
attain a C or above] 



3.6 Figure 2 shows that there are year-on-year fluctuations in the overall achievement gap (in 
favour of girls), but that girls have never scored lower than boys since 1974 (using the same kind 
of figures as in the table above). It also suggests that until 1987/88 the overall trend of the gap 
was relatively static with a low in 1978 and high in 1983. Just at the period when overall scores 
begin to rise, there is a sudden jump in the size of the achievement gap over a two year period 
until the gap stabilises again from 1988/89 to 1997/98. This could explain the perplexing, to 
some, finding that from 1992 to 1997 the gender gap in a subject-by-subject analysis remained 
constant (Gorard 2001b). In summary, the gender gap at GCSE is chiefly a phenomenon 
appearing between 1987 and 1989, and growing only during that same period. This information 
could be key to our understanding of the determinants of this gap. 



Figure 2 - Achievement gap in favour of girls attaining 5+ GCSE A*-C 





9 



10 



3.7 The differential attainment of boys and girls at 16+ has appeared over a relatively brief 
period since 1987, concurrent with major increases in qualification levels for the entire 16-year- 
old cohort. The introduction of the GCSE heralded several other major changes including the 
abolition of strict norm-referencing at O level which had previously worked to maintain results 
at a relatively constant level (Foxman 1997). This was linked to the largest ever annual increase 
in the proportion of those reaching the GCSE (or O level equivalent) benchmark in 1988, and the 
second largest in 1989. In addition, the publication of the results for the 16-year-old cohort 
replaced the previous School Leavers Survey (which had included results from children of other 
age-groups) and formed the basis for new school performance tables. It is surely no coincidence 
that the gender gap appeared at precisely the same time as these changes, along with the 
introduction of course-work assessment and the onset of the National Curriculum with SATs, 
which according to evidence from the Youth Cohort Study have all greatly increased the chances 
of success for those from 'poor' backgrounds (Dolton et al. 1999). 

3.8 The potential practical importance of such a basic finding cannot be over-estimated. Given 
that the gender gap is, in fact, related to both social class and levels attainment, then the 
appearance of a large gap just when children from poorer families began to score more highly 
begins to suggest possible explanations. Consider the following as one example of an implication 
for ameliorative strategies. If the notion of what constitutes 'work' and what is appropriate for 
home varies by occupational class, it may be that 'working-class' men, and their boys, do not 
bring work home, whereas 'working-class' women, and their girls, do. If so, strategies such as 
homework clubs or Saturday sessions at school may be more natural and therefore effective for 
such boys than homework pacts. Another possible conclusion to be drawn from this would be 
that differential attainment by sex is a product of the changed system and nature of assessments 
rather than any more general failing of boys, their ability, application, or the competence of those 
who teach them. Such a conclusion, that differences are highly dependent on the nature of 
assessment, would be supported by the recent debate over the apparent improvement in boys' 
literacy as a result of the literacy hour where sensitivity to the precise nature of the test appeared 
to determine the nature of the gender gap (Cassidy 2000), and by the finding that achievement 
gaps can vary considerably depending on whether the assessment is by teacher or task/test. 

3.9 Similar findings apply, where data are available, to differential attainment by ethnic group 
and by economic region. It is not clear why differences between ethnic groups, regions, and 
genders occupy so much commentator attention. The gaps between other social groups, such as 
by first language or between rich and poor, are much larger than the gender gap. Perhaps the 
biggest single gap is between the high and low achievers. The achievement gaps between the top 
and bottom 10% are very large, and completely dwarf any differences between boys and girls. 
However, these gaps are also inherent in the nature of the assessment system. A system that did 
not differentiate at all would be dismissed, but it is clearly possible to change the assessment 
system to reduce the gap in ‘surface’ attainment between any groups. 

3.10 Although the methods used here allow fair comparisons over time and place, there is no 
method suitable for comparing gaps in tests scores between different age groups. It is impossible, 
for example, to decide whether the gender gap is larger, smaller or the same at Key Stage Four as 
it is at Key Stage Two (although this does not prevent commentators from making spurious 
comparisons based on expected levels). The metrics are not equivalent. However, the gender gap 
in qualifications, such as it is, reverses among adults in later life. 




11 



10 



4. Implications 



4. 1 One thrust of this paper has been to suggest that a consideration of standards or effectiveness 
is not a simple matter of counting and comparison (Gorard 2000d). Even where simplifying 
assumptions are made about the outcomes from schools, such as a concentration on statutory 
assessment and test results, philosophical and methodological difficulties persist. In light of these 
difficulties, there is certainly no evidence here of falling educational standards over time in 
Britain, no convincing evidence of underperformance relative to the educational systems of other 
developed nations, and no evidence of a highly polarised system. 

4.2 International and LEA-based comparisons do suggest that comprehensive systems of schools 
based on parental choice tend to produce narrower social differences in intake and outcomes. 
Systems with more differentiation lead to greater gaps in attainment between social groups. 
Finland, for example, has a high average reading score, a small gap between high and low 
attainers and comprehensive schools and a policy of choice. Germany, on the other hand, has a 
much lower average reading score, a large difference between high and low attainers, and a 
tiered system of selective schooling. The UK is currently still in a reasonable comparative 
position, with a high average reading score, below average differences between high and low 
attainers, and comprehensive schools with a policy of (limited) choice. The lessons for current 
policy are obvious. 

4.3 However, not all commentators are aware of this. There is a common crisis account of the 
position of UK schooling (and there is, perhaps, a tendency for all commentators to decry the 
position of their own countries). For example, Johnson (2002) recently complained that ‘British 
students may be among the world’s highest achievers, as the recent Organisation for Economic 
Co-operation and Development’s PISA study found. But the achievement gap between social 
classes remains one of the biggest in the world’ (p.23). This represents the view of the IPPR - an 
influential centre-left think tank. The introduction of choice policies have, according to this 
account, led to a greater polarisation of results. But this greater polarisation by parental 
occupation does not exist in the UK. Something that does not exist cannot, therefore, be the 
result of choice policies. 

4.4 There is no particular urgency about the issue of differential attainment by gender (certainly 
no more so than around 1988 when the only big increase took place) or by any other 
physiological group, and there may be many hidden dangers in tinkering with a school system 
already near 'initiative-overload'. Both the action research approach and the transfer of 
‘successful’ school strategies for raising the attainment of boys might therefore be considered 
both wasteful and unnecessary (not to mention inequitable). Since the current gap has existed 
since 1988, is not growing, and is much smaller than other systematic gaps (such as those by 
background of student), we can take our time to search for ameliorative solutions if they are 
required. It might, for example, be much simpler to obtain gender neutrality through a 
reconsideration or redesign of the assessment system (whence the gap may have come), than 
through changes in classroom interaction (and similar comment apply to ethnicity). While still 
facing potential problems such as teacher supply and inequitable funding arrangements, on any 
rational comparison the UK school system is in the healthiest state ever. Raw score indicators of 
attainment are rising annually, gaps between social groups are reducing, and socio-economic 
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segregation between schools has declined. We do not appear to need yet more major 
interventions to solve problems that do not exist and that detract from dealing with the problems 
that do. 

4.5 Perhaps the most important conclusions to be drawn are negative ones. The fact that boys 
and girls perform the same at low levels of attainment (or indeed at all in some subjects), 
coupled with the relative stasis of the gender gap since 1989, suggests that many potential 
explanations are now unworkable. Any useful causal explanation would focus on high, not low, 
level attainment, and suggest an instant one-off impact. Notably therefore, this differential 
attainment is not the result of a cultural change in society, new methods of teaching, seating 
arrangements in schools, mixed-sex classes, boys' laddishness, or poor attendance at school. This 
has serious implications for the conduct of future work, and for the validity of previous work, in 
this area. Longitudinal work with large-scale datasets has elucidated the overall pattern, while 
the action research and the transfer-of-successflil-strategies approaches adopted by the DfES 
have been unhelpful at this stage. In fact, a considerable amount of public funding is being 
wasted in attempting to solve a specific problem of underachievement at school that does not 
actually exist. 

4.6 Another conclusion to be drawn from this would be that differential attainment by gender is a 
product of the changed system and nature of assessments rather than any more general failing of 
boys, their ability, application, or the competence of those who teach them. Such a conclusion - 
that differences are highly dependent on the nature of assessment - would be supported by the 
recent debate over the apparent improvement in boys' literacy. This improvement was apparently 
the result of sensitivity to the precise nature of the test. It might, for example, be much simpler to 
obtain gender neutrality through a reconsideration or redesign of the assessment system (whence 
the gap may have come), than through changes in classroom interaction. Whatever ameliorative 
strategies are proposed, it would be preferable for them to be considered carefully in light of a 
fuller analysis of differential attainment than hitherto (especially through a consideration of the 
interaction of gender, ethnicity, poverty and so on). This should also be done with the full 
realisation that all such strategies may have longer term impacts on the lives of both men and 
women in adult society. 

4.7 Value-added analyses of individual student performance data have called into question the 
underachievement of large groups of students. 

4.8 The use of school improvement models has led, indirectly, to an overemphasis on the most 
visible indicators of schooling - examination and test scores. There is a considerable danger of 
targets, based on these indicators, determining the practice of organisations. The use of test 
scores leads to three related problems. It may marginalise other purposes and potential benefits 
of schooling. In addition, it suggests that variations in the scores themselves are the product of 
school effects when the evidence clearly shows otherwise. It also neglects the fact that the scores 
themselves are artificial, and technicdly difficult to compare fairly over time or place. Our 
current examination system was designed to differentiate between candidates. If it did not do so, 
it would be rejected, presumably, as ineffective. We cannot, logically, use this differentiation per 
se as evidence of underachievement. 

4.9 All of the findings from the kind of studies described here, and smaller-scale studies of 
classroom processes, will remain open to dispute until a decision is made to demand definitive 
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experimental testing of the possible determinants of achievement. Meanwhile, we are often left 
with mere pseudo-explanations such as sex, region or social class. Even if we could show that 
sex was a cause of a level of achievement, we could not adjust the sex of individuals to 
ameliorate low achievement. Even if we find that achievement varied by region, we would be 
foolish to believe that transplanting populations between regions would be practical or effective 
(and so on). This is what we mean by ‘pseudo-explanations’. 

4.10 General improvements in the standards of, and outcomes from, education appear to be 
reducing the educational inequalities between different social groups and geographical regions. 
Kelsall and Kelsall (1974) present some evidence that the gap between the top and bottom of the 
social scale in economic, power and status terms was being reduced by the 1970s. Although 
inequality and injustice for the socially disadvantaged has always existed (MacKay 1999), in 
fact, 'if you take a long-term historic^ perspective of the provision of education in the UK 
throughout its entire statutory period... you could say that a constant move towards greater 
justice and equity has been the hallmark of the whole process' (p.344). If so, good. 
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